CUDA 编程指南：CUDA 内核开发基础

CUDA 内核开发始于对一个内核的定义，它是一个专门设计用于在拥有海量核心数的 NVIDIA GPU上并行执行的特殊 C++ 函数。这些函数是 CUDA 编程模型中的基本工作单元，充当了串行主机逻辑向大规模并行设备执行过渡的桥梁。

该 __global__ 声明限定符是必需的 API 限定符，它指示编译器为 GPU 生成代码，同时保持函数入口点对 CPU 可见。 可在主机上调用并在 GPU 上执行的函数被称为内核。

内核被分派到并执行于 流式多处理器（SMs）。SM 是 NVIDIA GPU 中的核心计算引擎，负责管理数百个并发线程。每个 SM 负责处理线程块，并将它们调度到处理核心上。

语法规则： 内核必须严格返回 void。由于它们与主机异步运行，无法直接向 CPU 返回值；必须将结果写回分配的设备内存中。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary function of the __global__ specifier?

It defines a function that runs on the CPU but is callable from the GPU.

It defines a kernel that runs on the GPU and is callable from the CPU.

It allocates memory on the GPU's SM cache.

It synchronizes all threads in a block.

✅ Correct!

Correct! __global__ is the bridge used to launch kernels from Host code.

❌ Incorrect

Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.

QUESTION 2

Why must CUDA kernels return void?

Because they execute asynchronously and have no direct path to return values to the Host thread.

To save registers on the SM.

Because GPU memory is read-only.

The NVCC compiler does not support float returns.

QUESTION 3

Which hardware component is responsible for managing and executing threads in a CUDA kernel?

The PCIe Controller.

The Streaming Multiprocessor (SM).

The Host RAM controller.

The BIOS.

QUESTION 4

What happens when a Host calls a kernel function?

The CPU halts until the GPU finish processing.

The GPU creates a clone of the function for every available SM.

The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.

The CPU performs a context switch to the GPU.

QUESTION 5

Which of the following is the correct definition of a CUDA kernel?

A function that executes on the GPU and is invoked from the Host.

A C++ library for file I/O.

A hardware driver for NVIDIA GPUs.

A standard CPU function with the __gpu__ prefix.